Temporal Abstraction in TD Networks
نویسندگان
چکیده
Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment (Sutton & Tanner, 2005). These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the inter-related predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction (Sutton, Precup & Singh, 1999). The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces. We present empirical examples of our algorithm’s effectiveness and of the greater representational expressiveness of temporally-abstract TD networks.
منابع مشابه
Temporal Abstraction in TD Networks
Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment (Sutton & Tanner, 2005). These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of a...
متن کاملTD Models: Modeling the World at a Mixture of Time Scales
Temporal-diierence (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present theory and algorithms for intermixing TD models of the world at diierent levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based rein...
متن کاملUsing Decision Trees as the Answer Networks in Temporal Difference-Networks
State representation for intelligent agents is a continuous challenge as the need for abstraction is unavoidable in large state spaces. Predictive representations offer one way to obtain state abstraction by replacing a state with a set of predictions about future interactions with the world. One such formalism is the Temporal-Difference Networks framework [2]. It splits the representation of k...
متن کاملTD(λ) Networks: Temporal-Difference Networks with Eligibility Traces
Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-s...
متن کاملPOMDP 環境中での TD - Network の自動獲得 : 単純再帰構造による拡張 Automatic Acquisition of TD - Network in POMDP Environments : Extension with SRN structure 牧野貴樹
We propose a new neural network architecture, Simple recurrent TD Networks (SR-TDNs), that learns to predict future observations in partially observable environments, using proto-predictive representation of states. SR-TDNs incorporate the structure of simple recurrent neural networks (SRNs) into temporal-difference (TD) networks to use proto-predictive representation of states. Our simulation ...
متن کامل